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ABSTRACT 

Short abstracts of viorks in progress or completed in 
the Department of Computing Science at the University of Alberi^a are 
presented under five major headings. The five categories are: Storage 
and search techniques for document data bases. Automatic 
classification. Study of indexing and classification languages 
through computer manipulation of data bases. Library automation, and 
Information transfer processes and national networks. Faculty and 
student neunes and document titles are provided. (SJ) 
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1. INTRODUCTION 

In the Department of Computing Science at the 

S'-ho^h'-^J: °^ ""^^^^^^ ^ ^^°"P several years 

Jetrieva?^nS^?-K''''^ and operational aspects of inforLtion 
n?J library automation. The group has cooperated 
^ Alberta Research Council. Computer programs have 
^^?^onr^^°P^^ °" several docuLnt data bases fj? 

h«f n 5 ^Y^""®^®^^ retrospective search. Programs have 

exoer?m^ni^?^K '-^ '^^"^^ automation and usSd on an 

levl^aHreas research has been undertaken in 

«o-!on^«'^^®^^^^"P primarily concerned with the computing 
Sfi!^ K educational aspects of information processing! 
in M Ic InlfJ" ?«?\^«P°^ted in the journal liteJIiure, 

i2n?*K at international meetings. The Depart- 

r^sLi^h f^^^S^"^!! students engaged in information retrieval 
research at the M.Sc. and Ph.D. level. 

Sections 2-6 below contain brief descriptions of 
work completed since 1968. In most instances each paragraph 
iL nn^T?"^ combination, of abstracts that appeared in 
P^ii^ations or reports indicated in parentheses at the 
end of the paragraph. The references are to the publications 
listed in Section 8 , to the theses of Section 9 , or to th2 

on'^^eques?? ""^^ ^^^^^ are 'available 

Work in progress is included at the end of each 

™!in"* ^"^^^^^ details may be obtained directly from tiie 
person concerned. ' ^^wm uic 

r',,.,^^,-/^?^^^^^^^®!^ expressed to t!he National Research 
Council of Canada for financial support of some of the proiects 
Acknowledgment of support from othi? groups is inJluSld^In^ 
the project descriptions. 
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2. STORAGE AND SEARCH TECHNIQUES FOR 

EXDCUMENT DATA BASES 

2-1 - Coding and Storage Techniques and Retrospective Searches 

Between 1968 and 1971 computer searches were performed 
fortnightly on the Chemical Titles tapes as a service for 
current awareness. A cost function was formulated based on 
the number of titles searched^ the length of the guestions^ 
the extent to which guestions were batched ^ and certain details 
regarding the allowed forms of guestion. It was suggested 
that the search procedure should be designed to minimize 
computation time at the expense of convenience in the form of 
output y but that facilities should be included so that the 
user who is willing to pay the additional cost could receive 
output in a more convenient form. (L.H. Thiel and H.S« Heaps, 
Report 1968; H.S. Heaps and L.H. Thiel 1970). 

An investigation was made of methods of abbreviation 
of English words to standard length for computer processing. 
Five methods of sectionalizing the data were tested on three 
vocabularies. The first vocabulary consisted of 63,316 
title words, authors, or codens from the Chemical Titles 
tapes. The second vocabulary consisted of 6,354 terms from 
the MARC tapes. The third vocabulary consisted of 10,804 
words or phrases chosen from the Dictionary of Canadian 
English. Abbreviation technigues were tested for the effect 
of inclusion of length digit, check digit, and ordering of 
letters selected for inclusion in the abbreviation codes. It 
was found that abbreviation codes may be chosen to provide a 
high degree of discrimination for the data bases examined. 
(R.L. Treleaven, Thesis 1970). 

Retrospective search of large document data bases 
reguires development of special techniques for automatic 
compression of data and minimization of the number of input- 
output operations to the computer accessible files. Also 
the computer program should be designed to require a 
relatively small amount of internal memory. A description 
has been given of a program that me^ts these requirements. 
The vocabulary of the data base was automatically expressed 
in terms of 8, 16, and 24 bit codes chosen' to point to the 
natural spelling in a dictionary. Thus file size was reduced 
without the necessity for extensive processing for decoding. 
Use of a compressed bit string inverted index greatly reduced 
search time, and a storage management system enabled long 
strings to be processed with use of a limited amount of 
internal storage. Creation of "reduced" files and tables was 
an isnpQrtemt feature of the program; it allowed those files 
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needed only by specific phases of the program to be designed 
to use a relatively small amount of internal storage and 
input-output time. (H.S. Heaps 1970a; L.H. Thiel and K.S. 
Heaps 1972) . 

Analysis has been made of the effect of using an 
efficient code for compression of terms within a document 
data base. Storage efficiency was expressed in terms of 
the vocabulary length and the values of certain parameters 
which describe the structure of the code. For vocabularies 
of up to 100^000 terms the average code length is approxi- 
mately twelve bits. No information is lost through term 
truncation or abbreviation. It was shown that the tables 
required for coding and decoding may be ordered for rapid 
access without reduction in the ease of update. (H.S. Heaps 1972a). 

Work in progress by E. Schuegraf is a theoretical and 
experimental analysis of the use of word fragments as the 
basic vocabulary terms for information retrieval from large 
document data bases. Emphasis is being placed on consider- 
ations that affect storage requirements, search times , and 
query language capabilities. The investigation includes 
selection of key fragments from a set of I,2,3r4,5r6-character 
sequences, choice of a method of coding the terms, storage of 
the inverted index, suppression of false bits, decoding of 
titles, and storage requirements consistent with small search 
times • 

Also in progress is development of an operational 
retrospective search service for implementation by the 
National Science Library. Appreciation is expressed to the 
National Science Library for their financial support. 

2.2 - Automatic Question Modification 

A program for computer retrieval of papers in acoustics 
was used to search titles of papers that appeared in four 
journals between January 1955 and December 1967. The four 
journals were Journal of the Acoustical Society of America, 
Soviet Physics-Acoustics, Acustica, and Journal of Fluid 
Mechanics. Title words and author names were truncated to 
five letters. Search questions were allowed in the form of 
weighted terms connected by OR logic within parameters 
connected by AND or NOT logic. This data base, and a similar 
one for computing science literature, has been used in a 
number of subsequent studies including an examination of 
associative search methods. (D.M. Heaps and H.S. Heaps 1968; 
H.S. Heaps 1968). 

Three measures of effectiveness of an infojnmation 
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n^fiSr^ Plu^"^ ''^''^ formulated in terms of a user's 
estimate of the relevance of items output. In each instan.^« 

splc^Sca?L:'ircLr^^''°" was^^tuJStldtiiSouT"' 
^ : certain parameters which denote the weiahts 

attached to the question terms. The parameters werl Ihln 

brus^rifti'sfaSor^ ^^r^' effectivln'es" a^^easSed 

f errd^e^;roTc:;;ain^\%SrtLs^%^^ -^Se" 

or;Ses?ior^'r"""" " ^^^^ ^P^^"*--" the pLmU^^d 
hf.,«« «^ "'-^^^ measure of output relevance, and for data^ 
ar^n^fo^^'"^^!'' statistics. The techniques that we?e usSd 
are analogous to those used to define a matched filt" and 

KoM?0^: ;;°c'^""Kr'?h"^ 'ti^r- Heapra^d'w!^.??' 
AO j.y/u, w.C.C. Ko, Thesis 1970; H.S. Heaps 1971). 

m^i-hr.^ 1?°^^ in progress by A. Lo relates to an automatic 



3. AUTOMATIC CLASSIFICATION 
^•^ " Automati c Classification of Documents 

accordinrJn^2!!K^ ^^'^ automatic classification of documents 
according to subject categories have been examined for a 

of ?h«T ^^^^^^ °^ P^P^^^ published in the Journal 

^d Jlvf !°^i^ty °f America during 1966, 1967, ?968, 

SSL ! was found that latent class analysis does not 

form a useful technique. . Attribute analysis as proposed by 
Maron was found to be satisfactory with use of a p^opoled 

for choice of keywords from the titles /TSSIified 
of io?rii?" ?^ ^"^ib^te analysis was based on maximizati^i 
SLS ^^Jn °^ ^^^^ documents with use of not 

more than two keywords for the computation of joint word 

inllllT^r.T' ^^^^ ^^^^^ ^^^"^^ °f 3oint occurrence^ 

e?™??!^ ?5 '^^^ classification efficiency was 

SiaiisL ^^"^^ application of attribute 

analysis. (S. Akiyama, Thesis 1972) . 

nf i-ho ?f:^-K",?f°^''!^! ^- includes an examination 

• feasibility of document classification based on 

oTsign1f?Sant feltS^S!^' °' '""^ "'^^^^"^ determination 

- Computer Assisted Medical Diagnosis 

h=o K "^^^ problem of automatic diagnosis by use of a computer 
has been expressed as an optimization problem in which ^ 
parameters are chosen to minimize the diagnosis errors in 
rfll^r""® ^° ^ previously treated set of patients. The 
mn^M»^ were expressed in terms of statistical measures of 
rS^2Lfr°'''--5'-°'?^ °^ symptoms, and of symptoms with diseases. 
?o nfo °2 criterion was discussed, and a formula was derived 

iecelsaS ?o'SL^";^"°^^i^ ^^'"^ °' ^^^^ ^^^P^^™' was not 

o? f assumptions regarding mutual exclusiveness 

Sfaps !9?!b)!'' ^^^^^^^^^^1 independence of symptoms. (H.S. 

of ^v.^ ^^Z""^ in progress by J. Cumberbatch includes application 
nJi-?«n^-^°^!K^^^°'^ ^° analyse hospital data relating to 300 
patients with respect to a disease symptom complex of six 
diseases and eleven symptoms. Appreciation ia expressed to 
Dr. p. A. Scheinok who made the data available. 

3«3 - Holographic Image Processing 

Work in progress by D.K.K. Lam relates to pattern 
recognition through hologram interferometry and suitable 
processing of the image. i-ow-i-e 
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4. STUDY OF INDEXING AND CLASSIFICATION LANGUAGES 
THROUGH COMPUTER MfVNIPULATION OF DATA BASES 

An experimental computer program was written for on- 
line interactive construction of a thesaurus. The program 
allowed the user to create his thesaurus and use it at the 
same time. Such a program could be used in a library or 
information centre which required a thesaurus for word con- 
trol. The project was concerned with the development of 
an initial query language. If such an on-line thesaurus 
were to be run as a production system in a library, 
additional programs would be needed to link the* thesaurus 
to catalog search and to circulation. The program should 
also be optimized. It was written in PL/1. (A.L.S. Wong 
and D.M. Heaps, Report 1969) . 

A set of experimental programs were written to 
manipulate MARC tapes. Three aspects were covered: 
1. Prograiraning to achieve fast code conversion from ASCII 
to EBCDIC and to perform word counts. 2. Programming to 
dump, strip, and relate fields from the MARC tape. 3. * 
Programming to reformat the MARC tapes to search them on 
author and title using the programs developed at the 
University of Alberta for searching the Chemical Titles tapes. 
Tests were run and the results discussed with students in the 
School of Library Science* (D.M. Heaps, V. Shapiro, D. Walker, 
and F. Appleyard 1970) . 

An on-line thesaurus was developed to serve as a primary 
aid in classifying, indexing, and searching a specific water 
resource data base. Users are persons responsible for water 
resource management decisions. The data base contains material 
in bibliographic format of non-standard type. It includes 
research project descriptions, research grant applications, 
monographs, journal articles, abstracts of statutes, entire 
statutes, and so forth. The material is accessed and controlled, 
and new documents indexed and/or classified, through the jon- 
line thesaurus. The system has a thesaurus, a data base, and 
a class structure (schedules) . Programs and documentation are 
available. Programs are written in IBM360 Assembler. Appreci- 
ation is expressed to Environment Canada for financial assist- 
ance* (F, Alber and D.M. Heaps 1971; F. Alber, Thesis 1972), 

The availability of machine readable data bases, cuid of 
increasingly sophisticated computer programs and methods of 
operation, have made possible the investigation of €he 
indexing and classification processes through manipulation 
of data bases by computers. A study was carried out: that 
used data bases such as the MARC tapes, UDC schedules, a UDC 
indexed data base, and a water resource thesaurus. A 
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methodology was develooed fo 4-^e4. i.v 

LC and UDC classifications ?o^ con^Jol'oi'^''^'^'y °^ ^^e 
document collection. The tl^L?? ^ °^ ^ ^^^er resource 

S"l^'^^ln''^^ ^^''^^'^^^ ~ s^^^^ wrUing^f 

Thesaurus and the ^D^^cheduJer" l""^ ^'^^^^^ Resources 
to Environment Canada for ^^iJnciat^J''^-^^^^^" expressed 
and R. Freeman who made Lfi assistance, and to FID 

G.A. Cooke, and D?M? H^aps ?97i?^" ^Jl^^^f^^^' Mercier, 

neaps 1971, M. Mercier, Thesis 1972). 

use cf ne:io^!;s''a^„°d"^?|^uJ^es-Le°S5rja?'"^? '° '"^ -"^^-^ 
problems. These are- ""^'^^jyes tigation of two important 

of the network to the us^r . ^J^f of the 

resources 

sion of information from oAe l^vel ^^^^^^""ation or conver- 
another. Machine-readable vocabulary, to 

techniques deveJopId to ai?o,^ ^^^"9 used, and 

Of concepts is mapped on aiXran^^' mapping. One 'family 
produced. It is postu?ateS ^SlJ tSf /"J^^""^^^^^^ 
lexicon, will assist clasSiff^JJ- ^^^''^''^g^es, and the 
data bases used a^e the L J 1,^^^°''.^^^ retrieval. The 
Water Resources Thlsau?us ^"^^ect Heading Tape and the 
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5. LIBRARY AUTOMATION 

A preliminary report outlined a v-:oinmand structure 
for use with an on-line library automation system. The 
commands included ones that were user oriented^ circulation 
oriented^ and order oriented. (D.M. Heaps and H.S. Heaps^ 
Report 1968) . 

A pilot study was undertaken to deteririne the 
feasibility of designing a system to automate card catalog 
searching in a library using the Universal Decimal 
Classification/ in both batch and on-line mode. The use of 
the UDC in computer searching of the file was examined. An 
attempt was made to computerize a section of the card catalog 
as it existed and no changes were made in its basic format. 
No attention was given to a program for handling the 
circulation of books.. The unique features of the UDC were 
only marginally explored^ but the colon concatination was 
utilized. The study was carried out in cooperation with the 
Boreal Institute Library. (D.M. Heaps, L.S. Easton, E.R. 
Macallister, and R.L. Pallister 1970). 

Some of the most serious problems in library auto- 
mation arise because of the size of the files that must be 
stored on computer accessible devices. A discussion was 
presented of the techniques and approaches available for 
optimizing the storage requirements for bibliographic data 
with the use of data compression. [Reference was also made 
to file organization and access methods. Coding and decoding 
problems were dealt with. (W.D. Reid and H.S. Heaps 1971). 

A study of the application of on-line computer systems 
to library automation was divided into two parts. The first 
part involved the design, implementation, and evaluation of an 
integrated library automation system, the IT System^ to 
encompass an on-line catalog subsystem and a real-time 
circulation subsystem. Implementation was in the Departmental 
library. The second part of the study involved the dec«ign of a 
computer file structure to support an on-line integrated library 
automation system capable of serving the needs of a large 
academic library. The proposed design vras confined to the files 
necessary for the support of an on-line catalog subsystem and a 
real-time circulation subsystem. The design attempted to 
produce a file structure economicai in terms of storage 
requirements and CPU time, and also able to provide very short 
response times for most on-line transactions and queries.^ Con- 
siderable attention was given to a new method for construction 
of the inverted index files of the on-line catalog subsystem. 
Description of the method, based on the principles of virtual 
hash addressing, covered the detailed structures of the index 
files for authors, titles, and LC call numbers, the procedures 
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used in cjcnoration^ maintenance ^ and search of these files ^ 
and the theoretical performance in terms of storage require- 
ments and file access times. (J.J. Dimsdale^ Thesis 1971) . 

Work in progress by J.J. Dimsdale is development of 
an optimal design foic the on-line catalog subsystem of a 
comprehensive on-line library automation system for a library 
that contains approximately one million titles. It is 
supposed that the total system also includes an on-line 
acquisition subsystem^ a real-time circulation system^ and 
possibly an on-line cataloging subsystem. It is proposed 
to develop a cost-time function/ and to use it to provide 
for optimal trade-off between such factors as file storage 
cost/ updating time and cost/ and query response times. 

Work in progress by J. A. Benbow is the design and 
implementation of an integrated on-line automated library 
system for ^use with the library of the Boreal Institute at 
the University of Alberta. This library contains approxi- 
mately 7/000 books and more than 10/000 documeivcs in 
addition to reports/ periodicals/ newspaper clippings/ and 
maps. It is indexed through use of the Scott's Index' 
variation of the standard UDC classif icp.tion. The auto- 
mated system will include a catalog search subsystem/ a 
computer assisted cataloging subsystem/ a real-time circulation 
subsystem/ and an acquisition subsystem. 

Also in progress is development of a set of computer 
programs for bibliographic processing for use in the library 
of the Boreal Institute. Appreciation is expressed to the 
Donner Canadian Foundation for their financial support. 
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6. INFORMATION TRANSFER PROCESSES AND NATIONAL NETWORKS 

6.1 - Small Scale Dociamentation 

A study was made o£ the retrieval needs o£ a small 
research group who used and produced charts. A small manual 
coordinate- index type system was to be controlled by the 
research scientists with the aid of one clerical assistant. 
Problems encountered were the need for instruction of 
the scientists in techniques of dociament control, the 
difficulties of controlling charts in every stage of pro- 
duction, and the requirement for control of, cmd personal 
entry into, the system by the working scientists. (D.M. 
Heaps, Report 1968). 

Experimental work in query languages for on-line 
personal documentation was carried out at the University 
of Alberta in 1967 and 1968. The computer used was an 
IBM 360/67 with on-line typewriter terminals. The data 
base comprised the personal files of one faculty member. 
The investigation revealed certain fundamental differences 
in query languages. It was found that query lemguages for 
large scaJLe batch systems tend to be fixed, but may be 
complicated. Query lauiguages for small scale personal 
systems must be simple, but must allow for changes. The 
query languages tested allowed various forms of question 
logic. Basic query language instructions were grouped 
under three general modes, START, CHANGE, DISPLAY. It was 
decided that an efficient query, language must direct the 
search, keep track of purging, allow for choice in cheuige, 
telescope instructions, correct errors, and serve as its 
own manual of instruction. Initial programming was in APL 
as this was the only language then available through 
terminals. (D.M. Heaps and P. Sorenson 1968; D.M. Heaps 
and W. Harris, Report 1969}. 

6.2 - Infoannation Transfer 

Methods of information handling have always arisen 
from the historical needs of the time. It is suggested, 
moreover, that information has always represented power. 
The organization of the information and the type thought to 
be useful reflects the current social organization and demands. 
Until the late 1940's information regarded as valuable was 
hierarchically arranged and was predominately legal, literary, 
historical or political, and predominately published in 
monographs. This period is defined as the CLASSICAL period. 
With the great in^ortance of science and technology in the 
1950's classification broke down; information was needed in 
discrete units (reports and articles) , and was then discarded. 
This period is defined as the MODERN period. It is postulated 
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m«n4. .,1,? ""^^^ °" information retrieval exoeri- 

auestionJ afvfi experimental SDI service. The 

2n^^i= asked, answers received, analysis of relevance 
rol ^2"^ learned were discussed in some detail (S a' 
Cooke and D.M. Heaps 1970). uetaix. ig.a. 

banks wiif various types of users with suitable data 

Sorks The SLlS^f ^f'" ri^**.^^^ 5^°^^ °f information 

documents in the data banks display character- 
xstic styles. A system has been described whereby ce?Sin 
fundamental characteristics of style are recoa^ized bJ S^ 
SfT' '° " Pattern^outpSt iyV^^tt^"" 

hi ^^^^ 

bases of many present day data banks. (D.M. HeapfSJd W? Ingram 

Work in progress by W. Ingram involves further 
implementation and testing of the system for recogUiJion 

^•^ " National Policies and Education 

""^^or industralized countries have carried 
out a series of studies on scientific and technical info™ 

^<5,fCience policy. Canada has ?Sken paJJ S JSfs 
process with studies largely organized by the Federal 
Government. The more significant CanadiL s^udleiy iheir 
e^l^itt ySS?.^?."^^^^^ interaction with nat?oia?'p^??iJ, 
^ie^JfiL SSxi*';''^""* '^'^^ education of informaSon^' 
cSaS^in of??o? ^^e" outlined. The basic aim was to elicit 
Canadian official and non-official policy. Certain subWtlv*. 

""^l^ expressed. (D.M. H^psllAd g!a! cSoSf llfSf ! 
(S?s'"H:ip2M?Jb^^"^ scientists have been discussed"'°^ 

n»r^c.r,^A^^nl^^}^ ?^ education in Canada has been 

^Sfn • inclusion of information science within a 

STJiS^ratn^''^ university curriculum, and the operation 
SLn ft ^ ^? demonstrate automatic techniques, has 

n S'^*^- P*vars 1970, J. S^yworth 

1971; D.M. Heaps and J. Heyworth 1971). neyworm 
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